A pp ea rs in A C L ’ 94 Similarity - Based Estimation of WordCooccurrence Probabilities
نویسندگان
چکیده
In many applications of natural language processing it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations \eat a peach" and \eat a beach" is more likely. Statistical NLP methods determine the likelihood of a word combination according to its frequency in a training corpus. However, the nature of language is such that many word combinations are infrequent and do not occur in a given corpus. In this work we propose a method for estimating the probability of such previously unseen word combinations using available information on \most similar" words. We describe a probabilistic word association model based on distribu-tional word similarity, and apply it to improving probability estimates for unseen word bigrams in a variant of Katz's back-oo model. The similarity-based method yields a 20% perplexity improvement in the prediction of unseen bigrams and statistically signiicant reductions in speech-recognition error.
منابع مشابه
Exact maximum coverage probabilities of confidence intervals with increasing bounds for Poisson distribution mean
A Poisson distribution is well used as a standard model for analyzing count data. So the Poisson distribution parameter estimation is widely applied in practice. Providing accurate confidence intervals for the discrete distribution parameters is very difficult. So far, many asymptotic confidence intervals for the mean of Poisson distribution is provided. It is known that the coverag...
متن کاملDevelopment of an Index-based Regression Model for Soil Moisture Estimation Using MODIS Imageries by Considering Soil Texture Effects
Soil moisture content (SMC) is one of the most significant variables in drought assessment and climate change. Near-real time and accurate monitoring of this quantity by means of remote sensing (RS) is a useful strategy at regional scales. So far, various methods for the SMC estimation using a RS data have been developed. The use of spectral information based on a small range of electromagnetic...
متن کاملAn ontological hybrid recommender system for dealing with cold start problem
Recommender Systems ( ) are expected to suggest the accurate goods to the consumers. Cold start is the most important challenge for RSs. Recent hybrid s combine and . We introduce an ontological hybrid RS where the ontology has been employed in its part while improving the ontology structure by its part. In this paper, a new hybrid approach is proposed based on the combination of demog...
متن کاملCorrelation between IP and Rs and grade data in modeling and evaluation of a copper deposit, case study: the Sarbisheh copper deposit, Iran
This paper addresses the application of integrated chargeability and resistivity method and grade data in modeling and evaluation of copper deposits. We argue that the relationship between IP, Rs and grade data may be used for modeling and reserve estimation and tested this argument for Sarbisheh copper deposit that is located in eastern Iran. Geology and mineralization situation of Sarbisheh d...
متن کاملتوسعه و ارزیابی مدلهای تخمین تابش خورشیدی بر اساس ساعات آفتابی و اطلاعات هواشناسی
Global solar radiation (Rs) has wide applications in several disciplines. The data of measured or predicted Rs are widely applied by solar engineers, architects, agriculturists and hydrologists. Due to the importance of Rs, several empirical models have been developed to predict its values all over the world. In this study, Angstrom model was calibrated based on the ratio of actual and possible...
متن کامل